Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets

نویسندگان

  • Xinhai Liu
  • Shi Yu
  • Yves Moreau
  • Bart De Moor
  • Wolfgang Glänzel
  • Frizo A. L. Janssens
چکیده

To obtain correlated and complementary information contained in text mining and bibliometrics, hybrid clustering to incorporate textual content and citation information has become a popular strategy. In this paper, we propose a new computational framework of integrating text mining and bibliometrics to provide a mapping of journal sets. Two different approaches of hybrid clustering methods are applied in this paper. The first category is ensemble clustering, which combines different clustering results obtained from individual data into a consolidated clustering result. The second category is kernel fusion, which maps heterogeneous data sets into the kernel space and combines the kernel matrices for clustering. Kernels can be combined either averagely, or by an optimized weighted linear combination model. In this paper, we propose a novel adaptive kernel K-means clustering algorithm to combine textual content and citation information for clustering. The proposed algorithm is systematically compared with other methods on a clustering problem of 1869 journals published in 2002-2006. Based on several validation indices, the experimental results demonstrate that our hybrid clustering strategy is able to provide clustering result as well as the best individual data source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis.The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hyb...

متن کامل

BiBliometric methods for detecting and analysing emerging research topics

This study gives an overview of the process of clustering scientific disciplines using hybrid methods, detecting and labelling emerging topics and analysing the results using bibliometrics methods. The hybrid clustering techniques are based on biblographic coupling and text-mining and ‘core documents’, and cross-citation links are used to identify emerging fields. The collaboration network of t...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Combining full text and bibliometric information in mapping scientific disciplines

In the present study results of an earlier pilot study by Glenisson, Glänzel and Persson are extended on the basis of larger sets of papers. Full text analysis and traditional bibliometric methods are serially combined to improve the efficiency of the two individual methods. The text mining methodology already introduced in the pilot study is applied to the complete publication year 2003 of the...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009